DISTRIBUTED LINEAR PARAMETER ESTIMATION: 
ASYMPTOTICALLY EFFICIENT ADAPTIVE STRATEGIES 
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Abstract. The paper considers the problem of distributed adaptive linear parameter estimation 
in multi-agent inference networks. Local sensing model information is only partially available at 
the agents and inter-agent communication is assumed to be unpredictable. The paper develops a 
generic mixed time-scale stochastic procedure consisting of simultaneous distributed learning and 
estimation, in which the agents adaptively assess their relative observation quality over time and 
fuse the innovations accordingly. Under rather weak assumptions on the statistical model and the 
inter-agent communication, it is shown that, by properly tuning the consensus potential with respect 
to the innovation potential, the asymptotic information rate loss incurred in the learning process 
may be made negligible. As such, it is shown that the agent estimates are asymptotically efficient, 
in that their asymptotic covariance coincides with that of a centralized estimator (the inverse of the 
centralized Fisher information rate for Gaussian systems) with perfect global model information and 
having access to all observations at all times. The proof techniques are mainly based on conver- 
gence arguments for non-Markovian mixed time scale stochastic approximation procedures. Several 
approximation results developed in the process are of independent interest. 

Key words. Multi-Agent Systems, Distributed Estimation, Mixed time scale, Stochastic ap- 
proximation. Asymptotically Efficient, Adaptive Algorithms. 

1. Introduction. 

1.1. Background and Motivation. Recent advances in sensing and communi- 
cation technologies have enabled the proliferation of heterogeneous sensing resources 
in multi-agent networks, typical examples being cyberphysical systems and distributed 
sensor networks. Due to the large size of these networks and the presence of geograph- 
ically spread resources, distributed information processing and optimization (see, for 
example, (l]|2]) techniques are gaining prominence. They not only offer a robust al- 
ternative to fusion center based centralized approaches, but lead to efficient use of the 
network resources by distributing the computing and communication burden among 
the agents. A key challenge in such distributed processing involves the lack of global 
(sensing) model information at the local agent level. Moreover, the systems under 
consideration are dynamic, often leading to uncertainty in the spatial distribution of 
the information content. The performance of existing distributed information process- 
ing and optimization schemes (see, for example, [3}|l7]) based on accurate knowledge 
of the sensed data statistics may suffer substantially in the face of such parametric 
uncertainties. This necessitates the development of adaptive schemes that learn the 
model parameters over time in conjunction with carrying out the desired information 
processing task. 

Motivated by the above, in this paper we focus on the problem of distributed 
recursive least squares parameter estimation, in which the agents have no prior knowl- 
edge of the global sensing model and of the individual observation qualities as mea- 
sured in terms of the signal to noise ratio (SNR). Our goal is to develop an adaptive 
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distributed scheme that is asymptotically efficient, i.e., achieves the same estimation 
performance at each agent (in terms of asymptotic covariance) as that of a (hypo- 
thetical) centralized fusion center with perfect global model information and having 
access to all agents' observations at all times. To this end, we develop a consen- 
sus -\- innovation scheme, [6|, in which the agents collaborate by exchanging (appro- 
priate) messages with their neighbors (consensus) and fusing the acquired information 
with the new local observation (innovation). Apart from the issue of optimality, the 
inter-agent collaboration is necessary for estimator consistency, as the local observa- 
tions are generally not rich enough to guarantee global observability. Lacking prior 
global model and local SNR information, the innovation gains at the agents are not op- 
timal a priori, and the agents simultaneously engage in a distributed learning process 
based on past data samples with the aim of recovering the optimal gains asymptot- 
ically. Thus the distributed learning process proceeds in conjunction and interacts 
with the estimate update. Intuitively, the overall update scheme has the structure 



of a certainty-equivalent control system (see, for example, 18 19 and the references 
therein, in the context of parameter estimation), the key difference being the dis- 
tributed nature of the learning and estimation tasks. Under rather weak assumptions 
on the inter-agent communication (network connectivity on average) we show that, 
by properly tuning the consensus potential with respect to the innovation potential, 
the asymptotic information rate loss incurred in the learning process may be made 
negligible, and the agent estimates are asymptotically efficient in that their asymp- 
totic covariances coincide with that of the hypothetical centralized estimator. The 
proper tuning of the persistent consensus and innovation potentials are necessary for 
this optimality, leading to a mixed time-scale stochastic procedure. In this context, 
we note the study of mixed time-scale stochastic procedures that arise in algorithms of 
the simulated annealing type (see, for example, [20]). Apart from being distributed. 



our scheme technically differs from 20 in that, whereas the additive perturbation 
in 20 is a martingale difference sequence, ours is a network dependent consensus 
potential manifesting past dependence. In fact, intuitively, a key step in the analysis 
is to derive pathwise strong approximation results to characterize the rate at which 
the consensus term/process converges to a martingale difference process. We also 
emphasize that our notion of mixed time-scale is different from that of stochastic al- 



gorithms with coupling (see 21 22 ), where a quickly switching parameter influences 
the relatively slower dynamics of another state, leading to averaged dynamics. Mixed 
time scale procedures of this latter type arise in multi-scale distributed information 



diffusion problems; see, in particular, the paper 23 that studies interactive consensus 
formation in Markov modulated switching networks. 

We comment on the main technical ingredients of the paper. Due to the mixed 
time-scale behavior and the non-Markovianity (induced by the learning process that 
uses all past information), the stochastic procedure does not fall under the purview 



of standard stochastic approximation (see, for example, 24 ) or distributed stochastic 



approximation (see, for example, [61^-31 ) procedures. As such, we develop several 



intermediate results on the pathwise convergence rates of mixed time-scale stochastic 
procedures. Some of these tools are of independent interest and general enough to be 
applicable to other distributed adaptive information processing problems. 

We briefly summarize the organization of the rest of the paper. Section |1.2| 
presents notation to be used throughout. The abstract problem formulation and 
the mixed time-scale distributed estimation scheme are stated and discussed in Sec- 
tions 2.1 and 2.2 respectively. The main results of the paper are stated in Section [3] 
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whereas Section|4]presents some intermediate convergence results on recursive stochas- 
tic schemes. The key technical ingredients concerning the asymptotic properties of 
the distributed learning and estimation processes are obtained in Section [5j while the 
main results of the paper are proved in Section [6j Finally, Section [7] concludes the 
paper. 

1.2. Notation. We denote the fc-dimensional Euclidean space by M.''. The set 
of reals is denoted by K, whereas M+ denotes the non-negative reals. For a, 6 S K, 
we will use the notations a V 6 and a A 6 to denote the maximum and minimum 
of a and b respectively. The set oi k x k real matrices is denoted by M'^^'^. The 
corresponding subspace of symmetric matrices is denoted by S*^ . The cone of positive 
semidefinite matrices is denoted by S^J_, whereas §5j__|_ denotes the subset of positive 
definite matrices. The k x k identity matrix is denoted by Ik, while 1^ and Ofe 
denote respectively the column vector of ones and zeros in K*^. Often the symbol 
is used to denote the k x p zero matrix, the dimensions being clear from the context. 
The operator ||-|| apphed to a vector denotes the standard Euchdean C2 norm, while 
applied to matrices it denotes the induced £2 norm, which is equivalent to the matrix 
spectral radius for symmetric matrices. The notation A (g) B is used to denote the 
Kronecker product of two matrices A and B. 

Time is assumed to be discrete or slotted throughout the paper. The symbols t 
and s denote time, and T_|_ is the discrete index set {0, 1, 2, • • • }. The parameter to 
be estimated belongs to a subset 9 (generally open) of the Euclidean space . The 
true (but unknown) value of the parameter is 0* and a canonical element of O is 9. 
The estimate of 6* at time t at agent n is x„(t) G M*^. Without loss of generality, 
the initial estimate, x„(0), at time at agent n is a non- random quantity. 

Spectral graph theory: The inter-agent communication topology may be de- 
scribed by an undirected graph G — (V, E), with V = [1 • • • A^] and E the set of agents 
(nodes) and communication links (edges), respectively. The unordered pair (n, I) € E 
if there exists an edge between nodes n and I. We consider simple graphs, i.e., graphs 
devoid of self-loops and multiple edges. A graph is connected if there exists a patlj^ 
between each pair of nodes. The neighborhood of node n is 

nn = {iev\ [n, i)eE}. 



Node n has degree dn — \^n\ (the number of edges with n as one end point.) The 
structure of the graph can be described by the symmetric N x N adjacency matrix, 
A = [Ani], Ani = 1, if {n,l) G E, Ani = 0, otherwise. Let the degree matrix be the 
diagonal matrix D — diag {di ■ ■ ■ dj^). By definition, the positive semidefinite matrix 
L ^ D — A is called the graph Laplacian matrix. The eigenvalues of L can be ordered 
as = Ai(i) < \2{L) < ■ ■ ■ < Xn{L), the eigenvector corresponding to Xi{L) being 
(l/v^)lAf. The multiplicity of the zero eigenvalue equals the number of connected 
components of the network; for a connected graph, X2{L) > 0. This second eigenvalue 



is the algebraic connectivity or the Fiedler value of the network; see 32 for detailed 
treatment of graphs and their spectral theory. 

2. Problem Formulation. 

2.1. System Model and Preliminaries. Let 6* E Q he an Af-dimensional 
(vector) parameter that is to be estimated by a network of N agents. Throughout, 



path between nodes n and I of length m is a sequence (n = 

^0)^l)"'' , — of vertices, 

such that (ifc, ik+i) G-EVO<fc<m — 1. 
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we assume that all the random objects are defined on a common measurable space 
{fl,J-) equipped with a filtration {J-t}- For the true (but unknown) parameter value 
0* , probability and expectation are denoted by Pg. [•] and Eg* [■], respectively. All 
inequalities involving random variables are to be interpreted a.s. (almost surely.) 

Each agent makes i.i.d. (independent and identically distributed) observations of 
noisy linear functions of the parameter. The observation model for the n-th agent is 

Ynit) ^ Hn0* + Cn{t) 

where: i) {yn{t) G M^^"} is the observation sequence for the n-th agent; and ii) for 
each n, {Cn{t)} is a zero-mean temporally i.i.d. noise sequence of bounded variance, 
such that, Cn{t) is J-t+i adapted and independent of J^t- Moreover, the sequences 
{Cn{t)} and {Ci{t)} are mutually uncorrelated for n ^ I. For most practical agent 
network applications, each agent observes only a subset of M„ of the components 
of 9, with Mn <^ M . It is then necessary for the agents to collaborate by means 
of occasional local inter-agent message exchanges to achieve a reasonable estimate of 
the parameter 9*. Moreover, due to inherent uncertainties in the deployment and 
the sensing environment, the statistics of the observation process (i.e., of the noise) 
are likely to be unknown a priori. For example, the exact observation noise variance 
at an agent depends on several factors beyond the control of the deployment process 
and should be learned over time for reasonable estimation performance. In other 
words, prior knowledge of the spatial distribution of the information content (e.g., 
which agent is more accurate than the others) may not be available, and the proposed 
estimation approach should be able to adaptively learn the true value of information 
leading to an accurate weighting of the various observation resources. 

Let Rn G S^^! be the true covariance of the observation noise C,n{t) at agent n. It 
is well known that, given perfect knowledge of i?„ for all n, the best linear centralized 
estimator {Xc(t)} of 9* is asymptotically normal, i.e., 

Vtn(xe(t)-r)^ AA(o,i]-i), 

provided the matrix Ec = ^n=i -^n is invertible. In case the observation 

process is Gaussian, the best linear estimator is optimal, and Ec coincides with the 
Fisher information rate. In general, with the knowledge of the covariance only and no 
other specifics about the noise distribution, the above estimate is optimal, in that no 
other estimate achieves smaller asymptotic covariance than for all distributions 
with covariance i?„. 

The goal of this paper is to develop a distributed estimator that leads to asymp- 
totically normal estimates with the same asymptotic covariance E~^ at each agent 
under the following constraints: (1) Each agent is aware only of its local observation 
model H„ and, more importantly, (2) the true noise covariance i?„ is not known a 
priori at agent n and needs to be learned from the received observation samples and 
exchanged messages with its neighbors over time. Recently, in 33] a distributed al- 
gorithm was introduced that leads to the desired centralized asymptotic covariance 
at each agent but requires full model information (i.e., all the H^s) and the exact 
covariance values i?„ at all agents. This is due to the fact that, for optimal asymptotic 
covariance, the approach in [33 requires an appropriate innovation gain at each agent, 
the latter depending on all the model matrices and noise covariances. In the absence 
of model and covariance information, a natural alternative is to employ a certainty- 
equivalence type approach in which an adaptive sequential gain refinement (learning) 
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step is incorporated into the desired estimation task. In this paper, we show that 
such a learning process (see Section 2.2) is feasible in a distributed setting and, more 
importantly, the coupling between the learning and parameter estimation tasks does 
not slow down the convergence rate (measured in terms of asymptotic covariance) of 
the latter to 0* . 

2.2. Adaptive Distributed Estimator: Algorithm AVCE. The adaptive 
distributed linear estimator (AVCS) involves two simultaneous update rules, namely, 
(1) the estimate (state) update and (2) the gain update. To formalize, let {x„(i)} 
denote the {J-*} adapted sequence of estimates of 9* at agent n. 

Estimate Update: The estimate update at agent n then proceeds as follows: 

x„(^+l) =x„(t)-/3t ^ (x„(t)-Xi(i)) + atK„(t)(y„(t)-ff„x„(t)). (2.1) 

iGO„(t) 

In the above, {/3t} and {at} represent appropriate time-varying weighting factors for 
the consensus (agreement) and innovation (new observation) potentials respectively, 
whereas {Kn{t)} is an adaptively chosen {J^t}-adapted matrix gain process. Also, 
ri„(t) denotes the {J^t+i {-adapted time- varying random neighborhood of agent n at 
time t. 

Gain Update: The adaptive gain update at sensor n involves another {J-t} 
adapted distributed learning process that proceeds in parallel with the estimate up- 
date. In particular, we set 

Kn{t) = (G„(i) + JtlAir' Hi (Qnit) + itlM,X' (2.2) 

where {7*} is a sequence of positive reals, such that 74 — > as i — )■ 00, and the positive 
semidefinite matrix sequences {Qn{t)} and {Gn{t)} evolve as follows: 

Qn(t + i) = fiEy"(^)l fiEy"(^)l ' (2-3) 

and 

G„(t+1) = G„(t)-A (^"(*) - Gi{t))+at (hI {Q.^{t) + ithiS^ H,, - G„(t)) 

/esi„(t) 

(2.4) 

with positive semidefinite initial conditions Q„(0) and G„(0) respectively. 

Before discussing further, we formalize assumptions on the model, the time- 
varying communication topology and the algorithm weight sequences {at} and {/?*} 
in the following: 

(A.l): The observation model is globally observable, i.e., the (normalized) Gram- 
mian matrix 

1 ^ T -1 

n=\ 

is invertible, where i?„ denotes the non- singular true (but unknown) covariance of the 
observation noise Cn(t) at agent n. 

(A. 2): The {J- t+i} -adapted sequence {Lt} of communication network Laplacians 
(modeling the agent communication neighborhoods {f2„(f)} at each time t) is tempo- 
rally i.i.d. with Lt being independent of J-t for each t. Further, the sequence {Lt} 
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is connected on the average, i.e., \2{L) > 0, where L ~ Eg* [Lt] denotes the mean 
Laplacian. 

(A. 3): The sequences {Lt} and {Cri(i)}„gy fw^e mutually independent. 
(A. 4): There exists ei > 0, such that for all n, Eg. [|jCn(i)ll^^'^^] < oo- 
(A. 5): The weight sequences {at} and {/3t} are given by 

at = -, r — and j3t — -. ^ — , (2.5) 

(t + 1)^1 ^ (t + ' ^ ' 

where a, 6 > 0, < T2 < n < 1 and n > T2 + 1/(2 + £i) + 1/2. 

Remark 2.1. Note that the global observability requirement in (A.l) is quite 
weak and, in fact, is necessary to attain estimator consistency in a centralized set- 
ting. In a sense, the assumption (A.l) on the global sensing model and the mean 
connectivity condition in (A. 2) provide minimal structural conditions for attaining 
distributed observability, i.e., the ability to obtain consistent parameter estimates in 
the proposed distributed information setting. Intuitively, the necessity of (A. 2) (in 
addition to (A.l)) for such distributed observability stems from the observation that, 
in general, in the absence of local observability a disconnected inter-agent communica- 
tion network would lead to multiple communication-disjoint agent components, none 
with sufficient informative measurements to consistently estimate the true parameter. 
We emphasize that the mean network connectivity assumption formalized in (A. 2), 
which generalizes the notion of connectivity in static communication topologies to dy- 
namic stochastic scenarios, models a broad class of agent networks with unpredictable 
communication; for instance, (A. 2) allows for spatially correlated communication link 
failures (often resulting from multi-agent interference) and subsumes the commonly 
used packet erasure model in gossip type of agent communications j34j . On the other 
hand, in the current setting, we assume that the inter-agent communication is noise- 
free and unquantized in the event of an active communication link; the problem of 
quantized data exchange in networked control systems (see, for example, [G ^SSj-jST] ) 
is an active research topic. 

We comment on the choice of the weight sequences {Pt} and {at} associated with 
the consensus and innovation potentials respectively. From (A. 5) we note that both 
the excitations for agent-collaboration (consensus) and local innovation are persistent, 
i.e., the sequences {/3f} and {at} sum to oo - a standard requirement in stochastic ap- 
proximation type algorithms to drive the updates to the desired limit from arbitrary 
initial conditions. Further, the square summability of {at} (ti > 1/2) is required 
to mitigate the effect of stochastic sensing noise perturbing the innovations. The re- 
quirement Pt/at — > 00 as t — > 00 (ti > T2), i.e., the asymptotic domination of the 
consensus potential over the local innovations ensures the right information mixing 
thus, as shown below, leading to optimal estimation performance. Technically, the 
different asymptotic decay rates of the two potentials lead to mixed time-scale stochas- 
tic recursions whose analyses require new techniques in stochastic approximation as 
developed in the paper. 

Example 2.1. As an illustration, consider the agent model in Fig. |2.1| with 
= 5 agents. The vector parameter 9* e in this example may have a physical 
interpretation, for example, with 0*, the n-th component of 0* , indicating the (un- 
known) intensity of a source geographically co- located with agent n, n = I,-- - ,5. 
Each agent n observes a scalar sequence 

y„(t) = (e:_i+e:+c+i)+Cn(t), 
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Fig. 2.1. An example: circles depict agents and dotted lines bi-directional communication links. 



perhaps corresponding to a superposition of local source intensities, where we adopt 
the convention that 6q — 9^ and 6q —91. It is readily seen that the local agent 
observations are not globally observable for 9*. In fact, in this example, without 
collaboration, no agent n would be able to reconstruct even the local intensity 
The collective observation model is however globally observable for 9*, i.e., (A.l) 
holds. The dotted lines denote the potential inter-agent communication links (possibly 
switching stochastically between on and off) through which the locally unobservable 
agents may collaborate by information exchange. By abstracting the above model in 



terms of the generic notation, the AVLS estimate update rule (2.1 1 at an agent n, 
say n = 3, then takes the form 

X3(t + 1) = X3(i) - A (2x3(i) - Xi(i) - X4(t)) 

+ at {Gsit) + -ith)-^ Hi {Q^{t) + 70^' (2/3W - 2:3,2 W - x^^:,{t) - x^^t)) , 



where H3 = [01110], Qsit) denotes the (scalar) sample covariance (2.3), X3^i{t) 
denotes the l-th component of X3(t) with ^ = 2, 3, 4, and G3(t) is updated as 

Gsit+l) = G3{t)^Pt (2G3(<) - Gi{t) - G4{t))+at [h^ {Qsit) + -ft)-^ H3 - G^it)) . 

In the above, we assumed that at time t, both the communication links (1,3) and 
(3, 4) are active. Assuming that the stochasticity, if any, in the link formation sat- 
isfies (A. 2), the following analysis will show that the above estimate sequences will 
optimally converge to 9* a.s. as i — >■ 00. 

3. Main Results. We formally state the main results of the paper, the proofs 
being provided in Section [6j 

The first result concerns the asymptotic agreement or consensus among the vari- 
ous agent estimates. 

Theorem 3.1. Let assumptions (A.1)-(A.5) hold. Then for each tq such that 

< To < Tl - T2 - — , 

I + £1 

we have 



V f lim (t+l)^«||x„(t)-x,(t)|| =0 



for any pair of agents n and I . 

Theorem |3.1| relates the rate of agreement to the difference ti — T2 of the algo- 
rithm weight parameters, the latter quantifying the relative intensities of the global 
agreement and local innovation potentials. Notably, the order of this convergence is 
independent of the network topology (as long as it is connected in the mean) and the 



distributed gain learning process (2.2)-(2.4). In fact, as will be evident from the proof 



arguments, the local covariance learning step in (2.3 1 may be replaced by any other 



consistent learning procedure, still retaining the order of convergence in Theorem |3.1[ 

Theorem 3.2. Let assumptions (A.1)-(A.5) hold with ti ~ 1 anda> 1. Then, 
for each n the estimate sequence {x„(i)} is strongly consistent. In particular, we have 

Vg, ( lim {t + ly ||x„(i) - II = 0) = 1 (3.1) 

for each n and t € [0, 1/2). 

The consistency in Theorem 3.2 is order optimal in that (3.1 1 fails to hold (unless 
the noise covariances are degenerate) with an exponent t > 1/2 for any (including 
centralized) estimation procedure, which is due to the fact that the optimal (cen- 
tralized) estimator is asymptotically normal with non-degenerate (positive definite) 
asymptotic covariance. 

The next result concerns the asymptotic efficiency of the estimates generated by 
the distributed AVCE. 

Theorem 3.3. Let assumptions (A.1)-(A.5) hold with ri = 1 and a ~ I, and 
let Yic = NYic- Then, for each n 

^{t + 1) (x„(t) ~e*)^M (0, E-i) , 

where •) and =4> denote the Gaussian distribution and weak convergence, respec- 
tively. 



Referring to the introductory discussion in Section 2.1 we note that the dis- 
tributed and adaptive AVCS achieves the best error covariance decay among the 
class of linear centralized estimators and is optimal in the Fisher information sense if 
the noise process is Gaussian. In a sense, Theorem |3. 3| reinforces the applicability and 
advantage of distributed estimation schemes. Apart from issues of robustness, imple- 
menting a centralized estimator is communication-intensive as it requires transmitting 
all sensor data to a fusion center at all times. On the other hand, the distributed 
AVCS algorithm involves only sparse local communication among the sensors at each 
step, and achieves the performance of a centralized estimator asymptotically as long 
as the communication network stays connected in the mean. Further, note that the 
assumption a = 1 is not necessary for asymptotic normality of the AVCS estimates; 
however, the optimality (asymptotic efficiency) is no longer guaranteed for a 7^ 1, i.e., 
the resulting asymptotic covariance of the estimates deviate from S^^. 

4. Some Approximation Results. In this section we establish several strong 
(pathwise) convergence results for generic mixed time-scale stochastic recursive pro- 
cedures (the proofs being provided in Appendix |A]). These are of independent interest 
and will be used in subsequent sections to analyze the properties of the AT>C£ scheme. 

Throughout this section, by {z^}, we will denote an {J-t} adapted stochastic 
process taking values in some Euclidean space or some subset of symmetric matrices. 
The initial condition Zq will be assumed to be deterministic unless otherwise stated. 
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Further, the probabihty space is assumed to be rich enough to allow the definition 
of various auxiliary processes governing the recursive evolution of {zj}. Since the 
results in this section concern generic stochastic processes not necessarily tied to the 
parameter vector, the 9* indexing in the probability and expectation will be dropped 
temporarily. 

We start by quoting a convergence rate result from '33] on deterministic recursions 
with time-varying coefficients. 

Lemma 4.1 (Lemmas 4 and 5 of [33| ) . Let {zt} he an valued sequence satis- 
fying 

Zt+i < (l-ri(t))Zi+r2(i), 
where {ri(t)} and {f2(i)} are deterministic sequences with 
""^ < ri(t) < 1 and r2{t) < 



{t + iy^ - - - [t + iY^' 

and ai > 0, 02 > 0, < (5i < 1, 82 > 0. Then, if Si < 82, (t + iy°Zt as t ^ 00, 
for all < Sq < S2 — 61. Also, if Si ~ 62, the sequence {zj} remains bounded, i.e., 
suPt>o ||zt|| < 00. 

We now develop a stochastic analog of Lemma |4.1| in which the weight sequence 
{ri(t)} is a random process with some mixing conditions. 

Lemma 4.2. Let {zt} be an {J^t} adapted M+ valued process satisfying 

zt+i < {l-ri{t))zt+r2{t). 

In the above, {ri{t)} is an {J^t+i} adapted process, such that for allt, ri{t) satisfies 
< ri(i) < 1 and 

<E[ri(t) I J-t]<l 



+ 

with ai > and < (5i < 1. The sequence {r2{t)} is deterministic, valued, and 
satisfies r2{t) < 02/(^ + 1)*^ with 02 > and S2 > 0. Then, if Si < S2, {t+lY^zt 
as t ^ CO for all < Sq < S2 — Si. 

Versions of Lemma |4.2| with stronger assumptions on the weight sequences were 



used in earlier work. For example, the deterministic version (Lemma 4.1) was proved 
in [6], whereas a version with i.i.d. weight sequences was used in [33|. Further, several 
variants under varying assumptions exist in the literature based on generalized super- 
martingale convergence theorems; see for example '25'!26i'38]. However, for reasons to 
be clear soon, in this work there will be instances in which the memoryless assumption 
on the weight sequences is too restrictive. Hence, we develop the version stated in 
Lemma 14.21 

The following result will be used to quantify the rate of convergence of distributed 
vector or matrix valued recursions to their network-averaged behavior. 

Lemma 4.3. Let {zt} be an IR+ valued {J-t} adapted process that satisfies 

zt+i < (1 - ri{t)) Zt + r2{t)Ut (1 + Jt) ■ 



Let the weight sequences {ri{t)} and {r2{t)} satisfy the hypothesis of Lemma 4-2 Fur- 
ther, let {Ut} and {Jt} be valued {J-t} and {J-t+i} adapted processes respectively 
with supoQ ||[/f II < 00 a.s. The process {Jt} is i.i.d. with Jt independent of J-t for 
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2+ei 



< K < oo for some ei > and 



each t and satisfies the moment condition E \\Jt 
a constant k > 0. Then, for every 5q such that 

< (5o < (52 -<5l - TT^, 
2 + £i 

we have (t + lY^'Zt ^ a.s. as t oo. 

The key difference between Lemma |4.3| and Lemma |4.2| is that the processes 
associated with the sequence {r2it)} are now stochastic. 



Lemma 4.4. Let {zt} be 



valued {J^t} adapted process such that zt G 
(see (B.ll) in Appendix \S\ for the definition of the consensus subspace C and its 
orthogonal complement for all t. Also, let {Lt} be an i.i.d. sequence of Laplacian 
matrices as in assumption (A. 2) that satisfies 

\2{L) = \2 (E[Lt])>0, 

with Lt being J-t+i adapted and independent of Tt for all t. Then there exists a 
measurable {J-f+i} adaptedM.^ valued process {rt\ (depending on {zj} and {Lt}) and 
a constant Cr > 0, such that < < 1 a.s. and 

\\iLNM-ptLt(E>LM)2t\\ < (l-rO||zt|| 



with 



E [rt I Tt] > 



(t + iy 



a.s. 



(4.1) 



for all t large enough, where the weight sequence {fit} o.nd T2 are defined in (2.5). 

Remark 4.1. We c omment on the necessity of the various technicalities in the 
statement of Lemma 4-4 Let Vnm denote the matrix (i/N) {In Im) {'^n ® Im)^ 
and T'NM'^t — since Zt g . With this, a naive approach of showing the existence 
of such a process {rt} would be to use the submultiplicative inequality 

\\{Inm - PtLt ® hi ~ Vnm) zt|| < Wnm - PtLt ® hi - 'Pnm)\\ ||zt|l • 

Using properties of the Laplacian and the matrix Vnai, H can be shown that for 
sufficiently large t 

Wnm - PtLt ® Im - Vnm) zt|| < (1 - Pt^U)) ht\\ ■ 



With this we may choose to define the desired sequence {rt} in Lemma 4-4 by 

n^PtMLt) (4.2) 



for all t. Indeed, {rt} thus defined satisfies < < 1 and (4.4) (at least for t large 
enough). Since Lt is independent of !Ft, we obtain 



E[A2(iO I -Ft] =E[A2(Lt)] < A2(L), 



where the last inequality is a consequence of Jensen's inequality applied to the concave 
functional A2(-). Thus the hypothesis \2{L) > does not shed any light to whether 
E[A2(-/jf)] > or not. Unfortunately, it turns out that in the gossip type of commu- 
nication setting, in which none of the network instances are connected, X2{Lt) — 
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a.s. Hence, in such cases E[A2(£t)] is actually 0. This in turn implies that the {rt\ 
proposed in (4.2) violates the requirement (4.1) oj Lemma 4-4 This necessitates an 
altogether different approach for constructing the desired sequence {rt}- As shown in 
the following, such an rt is no longer independent of Tt, being a function of both Lt 
and Zf in general. 

5. Convergence and Asymptotic Properties. In this section we establish, 
asymptotic properties of the AT>C£ and the associated distributed learning and es- 
timation processes. The key technical result concerning the adaptive gain learning 
process is presented in Lemma |5.1[ whereas, the major convergence properties of the 
estimate processes concerning boundedness of the agent estimates, inter-agent esti- 
mate consensus and estimate consistency are obtained in Lemma |5.4[ Lemma |5.7| 



and Lemma 5.9 respectively. The assumptions (A.1)-(A.5) are assumed to hold 
throughout the section. 

The following result concerns the convergence of the online gain approximation 
processes {Kn{t)} to their optimal counterparts Kn = S^, H^R^^. 

Lemma 5.1. For each n the gain sequence {Kn{t)} (given by (2.2)-( [2!4| j con- 
verges to Kn — Sc HnR~^ a.s., i.e.. 



Pe- (lim if„(t) = s/ijji?-!) - 1 



The proof is accomplished in terms of several intermediate steps that highlight the 
interaction between the dynamics of distributed collaboration and local adaptation. 
To this end, we first investigate the processes {G'„(i)}; see (2.4). The processes 
{G„(t)} may be viewed as approximations of the normalized Grammian and, as will 
be shown in the following, converge to Sc- The following assertion concerns the 
consensus of the approximate Grammians to their network average and is stated as 
follows (see Appendix |b] for a proof) : 

Lemma 5.2. For each n we have 



(lim \\Gn{t)~Gayg{t)\\=0) = 1, 



where Gavg{t) — ^ n{i) is the instantaneous network- averaged Grammian. 

On the basis of Lemma 5.2 to show the convergence of the approximate (normal- 



ized) Grammian sequences to Ec, it suffices to show the convergence of the network- 
averaged sequence {G'avg(0} to the latter. This is obtained in the following lemma 
(see Appendix [b] for a proof). 

Lemma 5.3. The following holds: 



Ve> lim Gavgit) 
We now complete the proof of Lemma |5.1| 



1. 



Proof. [Proof of Lemma 5.1 It follows from Lemma 5.2 and Lemma 5.3 that 

E,)=l (5.1) 



^e- lim G„(t) 



□ 



for dX\ n = 1, ■ • • ,N. The assertion in Lemma 5.1 is immediate from (5.1 ) and the 
observation that Qn{t) -> Rn and 7f as t — ) 

11 



The rest of the section is concerned with the convergence analysis of the estimate 
sequences {x„(f)} generated by the AVCE. Several results on the convergence be- 
havior of the estimates are presented, culminating in the proofs of the main results of 
the paper in Section [6] 

Lemma 5.4. The estimate sequences {x„(t)} generated by the AVCS algorithm 



(see (2.1)^ are pathwise bounded, i.e., for each n, supj>Q ||x„ {t)\\ < oo a.s. 



The proof of this Lemma involves a Lyapunov type argument. The decay rate 
estimates obtained in the next two Propositions (see Appendix [b] for proofs) are 
associated with certain time-varying spectral operators. They will be used in the 



construction of a suitable Lyapunov function needed in the proof of Lemma 5.4 given 
below. 

Proposition 5.5. Let JC = diag (ifi , • • • ,Kn) by Lemma 5.1 and let H = 
diag(7?i,--- ,H]si). Then, there exist ejc > 0, a (deterministic) time tjc and a constant 
Cfz, such that, 



(a 



\L (g) Im + atJCT-L z > CK.at ||z 



for all t > tic, z G M^"'^"', and JC satisfying \\JCT-L — JCH\\ < ejc. 

Proposition 5.6. LetJC andT-L be defined as in Proposition 5.5 Then, for every 
< e < 1 there exist a deterministic time and a constant Cj, such that, 

||2 



(5.2) 



(^ptL ® Lm + atJCn^ z > Ce/3t \\zc± \ 
for all t > t^, z E R^^'^ and K. satisfying 



icn-icn 



< e. 



Also, in the above Zq± denotes the projection of z in the orthogonal complement of 



the consensus subspace C as defined in (B.ll) in AppendixW] 



Proof. [Proof of Lemma 5.4 The estimator recursions in (2.1 ) may be written as 

Xi+i = (Lmm - PtL ® Im - atJCtU) x* - fit (Zt 8) Lm^ Xi + atJCtYt-, 

with Xi and denoting [xf (i), • • • , x^(t)]^ and [yf (t), • • • , y^(i)]^ respectively. The 
sequence {if} denotes the sequence of zero mean i.i.d. matrices given by Lt = Lf — Lt 
for all t. The process {zt} defined as zt = :x.t — 1^ ^ 6* may then be shown to satisfy 
the recursion 

Zf+i = {Inm - PtL <S) Im - atJCtU) zt - /?< {it ® hi^ Zf + atJCtCt, 

with C,t — • • • Xji{i)V ■ Now fix < £ < ejc A 1, where ek: is defined in the 

hypothesis of Proposition 5.5 Since, JCt —?' IC a.s., by Egorov's theorem ( |39|) for 



every S > 0, there exists t^ such that 



sup WICtH — /CHjl < £ ) > I — 6 and 

t>ts 



sup ||/Ct -/C|| <£ >1-S. 



Moreover, such a ts may be chosen to satisfy tg > tjcM t^, where tic and t^ are defined 
in the hypotheses of Proposition |5.5| and Proposition |5.6[ respectively. 
Let /Ce be a (deterministic) matrix, such that. 



\K,M-K,H\\<e and 



K\\ < e. 
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Then, for every S > 0, we may define the {J^t} adapted process {ICf}, such that, 



JCt \it < ts 

K.t if t > ts and WJCtTi - JCn\\ V \\JCt - IC\\ < e 
ICp otherwise. 



Also, for each 5 > 0, we define the {J-t} adapted process {zf} by the recursion 



'-'t+i 



= {Inm - PtL ® Im - atXfn) zf - Pt (Lt ® hi) zf + atlCfCt, 



with Zq = Zq. To show that the process {z^} (and, hence {xj}) is bounded a.s., we 
note that it suffices to show that the process {zf} is bounded a.s. for each 6 > 0. 
This is due to the fact that, by the definition of tg, for each 6 > we have 

sup||/Cf-A:t|| = I > 1-5, 
t>o 

and, hence 



sup Z( - zt = > 1- S. 
t>o ) 

Thus the boundedness of the processes {z^} for each (5 > would imply 



sup ||xt|| < cx) ) > 1 



for every (5 > 0. The assertion of Lemma |5.4| would then follow by taking (5 to zero. 

Hence, in the following, we focus only on the processes {z^} and show that the 
latter are bounded a.s. for every 5 > 0. To this end, fix (5 > and consider the Tt 
process = ||zf |p. It can be shown (Assumption (A. 3)) that 



[U ® hi) 



-2(zf) {PtL hi + atJCfn) zf + f3^zi) {L®hifzf 

+ai {zff {icfnf icfnzf + 2aA {zff (I ® hi) (icfn) zf. 

Since the Laplacians are bounded matrices by definition and the matrix ]Cf is bounded 
for t> tg hy construction, there exists a constant C3 > 0, sufficiently large, such that 
the inequalities 



{zffEe* [{Lt<E>hi)^] zf = (zf c^)^E(, 



{Lt (E) hi) 



< C3 



, (5.4) 



(z^)^ (L ® hiY4 = {4c^y {L ® hiY 



<C3 



(zf) {L®hi){JCf'H)zf<c:i\\i 



t J 



{zff{KfHfKfHzf<c:,\\zf\\\, E^. 



< C3 



(5.5) 
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hold for all t > tg with z^^j^ denoting the projection of z^ on the subspace C^. Also, 
by Proposition |5.5| and Proposition |5.6[ for t>ts, 



(zf ) (/3tL ® Im + atJCfn) zf > c^at ||z: 



where the positive constants ck: and are defined in the hypotheses of Proposition |5.5 
and Proposition 5.6 respectively. The inequalities (5.4|-(5.5| then lead to 

2 



Ee. I Ft] <Vt~ (ckA - 2C3/3?) 



for all t > ts- Observing the decay rates of the various terms in (2.5), we conclude 
that there exists tg >ts, such that, 

cicPt - 2c3/3t^ > and c^at - ^atPtC^, - a^c^ > 0, 



for t > ts and, hence. 



(5.6) 



for all t > ts- Let us introduce the {J-t} adapted process {V^.}, such that, 



(5.7) 



for t>0. The process {T^j} is well-defined as the sequence {at} is square summable. 
From (5.6 1 it follows immediately that 

En, 



Vt+1 I 



for t > ts- Hence, the process {V^}^^J^ is a supermartingale. Moreover, it is bounded 
from below, since 14 > by construction, and, in fact. 



s=0 

for all t > 0. Thus {F^ } f>j is a supermartingale that is bounded from below and, 

hence converges a.s. to a finite random variable , i.e., V^~^V a.s. as i — > oo. In 

particular, the process {V^} is pathwise bounded. By (5.7) the process {V/} is also 
pathwise bounded. Thus, for each S > 0, the process {zf} is bounded a.s. and the 
assertion follows. □ 

The next result (see Appendix [b] for a proof) quantifies the rate at which the 
different agent estimates reach agreement and is stated as follows: 

Lemma 5.7. For every tq such that < tq < ti — r2 — 1/(2 + ei), we have 

= oUi 



^e- (/nn (t + 1)"° (x„(t) - ^avg{t)) = o) 



with Xavgit) = (1/-^) ^"(0 denoting the instantaneous network averaged esti- 

mate. 
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The rest of the section focuses on the convergence properties of the network 
averaged estimate {Xavg(^)} ^^'^ completes the final steps required to estabhsh the 
convergence properties of the agent estimates {x„(t)}. The first result in this direction 
concerns the consistency of the average estimate sequence. 

Lemma 5.8. Under the additional assumption that ti — I (see (A. 5)) we have 



V f lim {^avg{t) - B*) = Q) 



1 



with Xavg{t) = (1/-^) ^"(0 instantaneous network averaged estimate. 

Proof. Let us denote by Zt the residual Xavg(i) — 0* . The J^j-adapted process {zj} 
may be shown to satisfy the recursion 

zt+i = (hi - atVt) Zt + atUt + atJt (5.8) 

with {Ft}, {Ut\ being J't-adapted, { Jf} being J^t+i-adapted and given by 

1 ^ 1 ^ 1 

r* = - ^ Kn{t)Hn, Ut^-Y, Kn{t) (x„(i) - Xavg(O) and Jt - -i^„(i)C«(<) 

n—1 71—1 

(5.9) 

respectively. Now fix < tq < ti — T2 — 1/(2 + Ei) and, by the convergence of the gain 



processes and Lemma 5.7 Ft — >■ Im and {t + ly^Ut — a.s. as t ^ oo. By Egorov's 
theorem the a.s. convergence may be assumed to be uniform on sets of arbitrarily 
large probability measure and, hence, for every 6 > 0, there exist uniformly bounded 
processes {r^}, {Uf}, and {JCf} satisfying 

Ve. I sup ||rf V \\JCf - /C|| > e ) = and Fg* i sup(s + 1)^° ||C/f || > £ ) = 

\s>tl J \s>tl J 

for each e > and some if (sufficiently large), such that 



sup \\T° - Lt V /C? - /Ct V U7f - f7t = > 1 - ,5. 
\t>o J 

Also, for each S > 0, define the J^f-adapted process {z^} by 

zf+i = {hi - atVf) z\ + atVl + a* j/ (5.10) 
with z^ = zo and = i ^n(t)Cn(i) and 



sup z° -Zf =0 >\-8. (5.11) 
t>o / 

By the above development, to show that zt — > as i oo, it suffices to show that 
Z( — > as i — >■ oo for each (5 > 0. Hence, in the following, we focus on the process 
{zj} only for a fixed but arbitrary (5 > 0. 

Now let {1^/} denote the {Tt\ adapted process such that V/ = ||zj|| for all t. 
Using the fact that Eg. [Ct | Ti\ — for all i, it follows that 



Ee* I Tt\ < \\Im - a*rf II V/ + 2at{Uff {hi - atTf) zf 



+aj \\Ut\\^ + ajEg, 
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l^tll' I J't 



For t large enough 



\2atUl (Im - atVf) zf \ < 2at \\U^\\ \\zf\\ < 2at \\uf 



\zff + 2at\\uf\ 



(5.13) 



Then making if larger (if necessary) , such that \\U^\\ < e{t + Eg. [|| Jt|p| J't] is 

uniformly bounded, and (5.13) holds for all t > if, it follows from ( 5.12 1-( 5.13 1 that 
there exist positive constants ci and C2 so that 

Ee* [Vf+i I Tt] < (1 - ciat + c^atit + l)-^") Vf 

+C2 {at{t + 1)-^" + aUt + + a?) 

for all t > t^. Since < tq < ri, the first term inside the second set of parenthesis on 
the right hand side dominates; by making C4 > C2 and C3 < Ci appropriately, we have 



Ee* [Vf+i I ^t] < (1 - cgat) Vf + c^atit + 1)-^" < Vf + c^atit + l)' 



(5.14) 



for all t > if. Now consider the {J-t} adapted process {l^i}, such that, 



(5.15) 



for i > 0. Since ti = 1 and tq > 0, the sequence {at(i +1) is summable 

and the process {V^} is bounded from below. It is readily seen that {l^j}t>f5 is a 
supermartingale and, hence converges a.s. to a finite random variable. By (5.15), 
the process {Vf} also converges a.s. to a finite random variable (necessarily non- 
negative). Finally, from ( 5.14| ), 



l^t+i] < (1 - cgaOEg. [F/] + c^atit + ly 



for i > if . Sinc e tq > the sequence {at(i + l)~'^''} decays faster than {at} and, hence 
by Lemma 4.1 we have Eg* [V^] — )■ as i 00. The sequence {V/} is non-negative, 
so by Fatou's lemma we further conclude that 



< Eg, [V^] < limmf Eg. [y/] = 0. 



as 



The above implies = a.s. by the non-negativity of . Hence ||zf | 
i — cx) and the desired assertion follows. □ 

By inductive reasoning, we now obtain a stronger version of Lemma |5.8| that 
quantifies the convergence rate in the above (see Appendix [b] for a proof). 

Lemma 5.9. Let assumptions (A.1)-(A.5) hold with ti = \ and a>\. Then, 
for each n and t G [0, 1/2), 



lim (i-t-l)n|x„(i)- 61*11 = =1. 



(5.16) 



6. Proofs of Main Results. The proof of Theorem [O] is a direct consequence 
of the triangle inequality and Lemma |5.7| since all agent estimates converge to the 
network-averaged estimate at the required rate. 

Proof of Theorem [HH] 
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Proof. Since ei > 0, ri = 1 and ti > r2 + 1/(2 + Si) + 1/2, from Lemma 5.7 there 
exists e > (sufficiently small) such that 

Pg. f lim (t + 1)1/2+^ ||x„(i) - Xavg(i)|| = O) = 1 



for all n. Moreover, by Lemma 5.9 for each r e [0, 1/2), we have (t + l)'^||xavg(t) — 
6*\\ a.s. sls t oo, for all n. Since t < 1/2 + e, an immediate application of the 
triangle inequality yields the required estimate convergence rate. □ 
Proof of Theorem IsTsl 

We will use the following result from |40 concerning the asymptotic normality of 
non-Markov stochastic recursions. The statement here is somewhat less general than 
in [40] but serves our application and eases the additional notational complexity. 

Lemma 6.1 (Theorem 2.2. in 40 ). Let{zt} be anM.'^ valued {Tt} adapted process 
that satisfies 



Zi+l = 



h - ^r,^ zt + {t + i)-^<ftVt + {t + ir^^^Tt, 



where {Vt} and {Tt} are M*^ valued stochastic processes, such that, for each t, Vt~i 
and Tt are J-'t-o,dapted, and where the processes {Tt} and {<&t} are M*^^'^ valued and 
{J-^t} adapted. Assume 

Tt ^ Ik, $t — $ and — > as t ^ oo. 

Let the sequence {Vt} satisfy E,[Vt\J-t] — for each t and there exist a constant C > 
and a matrix S such that C > \\E[VtVt^ \J-t] — S — >■ as t — >■ oo, and, with 



\Vt\\^dF, 



VtP>r{t+l) 



(6.1) 



let limf^oo Yl*s=o'^sr — every r > 0. Then, the asymptotic distribution of 
{t + l)^^/2zt is normal with mean and covariance matrix ^Tj^^ . 

Proof. [Proof of Theorem 3.3 The residual process {zf } and its (5-approximations 
{z^} are given in ([5^-(|5.10[). With n = a = 1, 



Zt+l 



M 



t + l 



Tt] zt + {t + ly^Ut + {t + ly^jt. 



where Ut and Jt are defined in (5.8|-(5.10|. Since Jt — {l/N)J2n=i^n{t)Cn{t) and 
the {Kn{t)ys may not converge uniformly (both in time and space). Lemma 6.1 is 
not applicable directly. He nce, we first consider the process {zf} for some S > 0. In 
order to apply Lemma 6.1 to the process {zf}, define 

Tt = {t + iy/'^uf 



for each t. Note that by ( |B.22[ ) \\U^\\ = o ((< + 1)"^/^) and, hence Tt ^ as t ^ oo. 
Also define 

$t = /M and Vt^jf 
for each t. Clearly, Eg. [VtjJ^t] = for all t. By the convergence of /C* to /C, 

1 ^ 

^lim Eg. [VtVt^ I -Ft] = \im — ^ Kf^Rn {Kf = 
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where the last step follows from Lemma |5.1[ Moreover the uniform boundedness of 
the process {/C^ } implies the existence of a constant C > such that 



|Ee. [VtV;^ I Ft] -S-^ < C 



for all t > 0. The {Vt} thus constructed also satisfies the uniform integrability assump- 
tion (6.1 ) due to the i.i.d. nature of the noise processes and the uniform boundedness 



of {/C( }. Thus, the process {z^} falls under the purview of Lemma 6.1 with $ = Im 
and E = Sr^. We thus conclude that 



(^ + 1) 



-1/2,5 



for each ^ > 0. To extend this asymptotic normality to the process {zt}, consider any 
bounded continuous function / : M^^ i — > M. By weak convergence (Portmanteau's 
theorem, [4l]) we have 



lim Eg 

f oo 



+ 1) 



-1/2,5 



(6.2) 



for each (5, where z* denotes a A/" (O, i) distributed random vector under the 
measure P.^. Denoting by ||/||co the sup- norm of /(•) (necessarily finite) we obtain 



from (5.11 ) 



Efl 



/ ((< + l)-i/M)] - [/ {{t + 1)"'/'^*)] II < 2^11/11 



By (6.2 1 we then have 



lim sup 



Efl 



/((t + l)-i/2z,)] -E,. [/(z*)]|| < 2^11/11 



Since the above holds for each (5 > 0, we conclude that Eg. [/ ((t + l)-i/2zf)] ^ 
Eg. [/ (z*)] as t — > cx). This convergence holds for all bounded continuous functions 
/(•) thus giving the required weak convergence of the sequence {(i -I- l)-i/^Zj}. □ 

7. Conclusion. We have developed a distributed estimator that combines a 
recursive collaborative learning step with the estimate update task. Through this 
learning process, the agents adaptively improve their quantitative model information 
and innovation gains with a view toward achieving the performance of the optimal 
centralized estimator. Intuitively, the distributed approach is a culmination of two po- 
tentials, the agreement (or consensus) and the innovation. By properly designing the 
relative strength of their excitations, we have shown that the agent estimates may be 
made asymptotically efficient in terms of their asymptotic covariance which coincides 
with the asymptotic covariance (the inverse of the Fisher information rate for Gaus- 
sian systems) of a centralized estimator with perfect statistical information and having 
access to all agent observations at all times. A typical application scenario involves 
multi-sensor distributed platforms, for example, the smart grid or vehicular networks. 
Such networks are generally equipped with rich sensing infrastructures and high sens- 
ing diversity, but suffer from lack of information about the global model and about the 
relative observation efRciencies due to unpredictable changes and constraints in the 
sensing resources. Extensions of this work to nonlinear sensing platforms are currently 
being investigated. Another important direction will be the extension of this adaptive 
collaborative scheme to dynamic parameter situations as opposed to the static case 
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considered in this paper. 



Appendix A. Proofs in Section |4[ 

Proof. Proof of Lemma |4.2| We start by showing that for each positive integer k, the 
following holds: 

lim {t + i)'=(^2-^i-Eo)]E Lfcl ^ (A.l) 

for every < eo <52 — 5i. The proof proceeds by induction on k. Let us first consider k = 1. 
We then have 



E [zt+i] < E [(1 - E[ri(i) 1 Tt]) Zt] + r2(t) 
< (l-ri(t))E[zt]+r2(t), 

where by fi{t) we denote the quantity ai/{t + 1)*^. The deterministic R+ valued sequence 
{E[zt]} satisfies the conditions of Lemma 4.1 and the claim in (A.l I holds for k = 1. Now 
assume the claim in (A.l I holds for all k < ko, with ko a positive integer. We now show that 
the claim also holds for k — ko + 1. Indeed, by the polynomial expansion 



fco + l _ \^ / Ko + i 

i=0 



((l-n(i))z,) 



and the fact that < ri(t) < 1, we have 



ko + 1 



In a way similar to (A.2l, the above implies 
E 



(A.3) 



By the induction hypothesis and the assumptions on the sequence {r2{t)}, there exist con- 
stants Ci for i — 1, - ■ ■ , fco + l, such that. 



E 



< 



(t + l)(fco + l-i)('52-<5l-eo)+i'52 (t + l){ko + l)(S2-Si-eo)+i(Si+ea) 



(A.4) 



for alH = 1, ■ • ■ , fco + 1. It is readily seen that the smallest decay rate in the above is attained 
at i = 1. Hence, from (|A.3|l-(|A.4l), there exists another constant cq, such that. 



E 



Co 



The deterministic sequence |e j^z^^^^j | then falls under the purview of Lemma |4.l| (by 

taking S2 = (fco + 1)('52 — 5i — eo) + {Si + eo) and Si = 5i. Since so > 0, an immediate 
application of Lemma |4.1| gives 



lim (t + i)('=o+i)('52-5i-Eo)]E 

t — yoo 



[z^'+^ 



= 



and the induction step follows. This establi shes the desired claim in (|A.l 1. 
We now complete the proof of Lemma 



4.2 



To this end, choose S, such that < S < 
S2 — 5i — So- Let ksg be a positive integer, such that fcig((52 — 5i — So — 5) > 1. Then, for 

E[z^«] 



every e > 0, we have 

P((t + l)'°zt >e) < 
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(A.5) 



The last step is a consequence of the claim in ( A.l I, by which there exists a constant c > 0, 
such that, 



{t+ l)**o(*2-5l-'5) 



for all t > 0. Since kso{d2 — 5i — So — S) > 1 by choice, the rightmost term in (A.5I is 
summable in t. We thus obtain X^t^o ((* + l)*°Zt > e) < oo, and, hence. 



((t+l)*°zt >e i.o.) =0 



(A.6) 



by the Borel-Cantelli lemma (i.o. stands for infinitely often in (A.6l). Since (A.6 1 holds for 
arbitrary £ > 0, we conclude that {t + l)*"zt — >■ a.s. as t — s> oo. □ 

Proof. [Proof of Lemma 



4.3 



Fix 5 e 0, ^2 - <5i - (5o 



The following is readily 



verified: 

For every £3 > 0, there exists R^^ > 0, such that 



sup 

Indeed, for any £2 > 0, we note that 



+s 



\Ut{l + Jt)\\ <Re,\ > l-£3 



(A.7) 



\Jt\\ > £2 < 



-2+ei 



(t+ l)l+'5{2+'^l) 



rE \\Jt 



||2+eil 



£2+"^(t+ 



Since 5 > 0, the term on the right hand side of (A.S I is summable, and by the Borel-Cantelli 
lemma we may conclude that 



Jt II > £2 i.o. = 



Since £2 is arbitrary, it follows that 



-+s 



lirn 



\Jt\\ = =1 



From the boundedness of {Ut} and (A.9I we may further conclude that 

1 



lim 

t — ^oc 



2 + ei 



+ S 



\Ut{l + Jt)\\ = =1 



(A.9) 



(A.IO) 



By Egorov's theorem the a.s. convergence in (A.IO I is uniform except on a set of arbitrarily 
small measure, which verifies the claim in (A.7I. 

We now establish the desired result by a truncation argument. For a scalar a, define its 
truncation {a)c at level C > by 



^min(|a|,C) if a / 
if a = 0. 



(A.ll) 



For a vector, the truncation operation applies component-wise. Now, for each C > 0, consider 
the sequence {zc{t)} given by the recursion 



zc{t -f 1) = (1 - r,{t))zc{t) + r2{t) {Ut{l + Jt)) 
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(A.12) 



with zc(0) = zq. Using (A. 11 1, we have 



ic{t + l) < {l-ri{t))zc{t)+?2{t), 



(A.13) 



where 



r2{t) < 



ki 



2 + ei 



(A.14) 



for some constant fci > 0. By construction the process {zc{t)} is {J-t} adapted and, hence, 
the recursion in ( A.13 1-( A.14 1 falls under the purview of Lemma 4.2 Thus, for every C > 0, 
we have {t + lY°zc{t) 



a.s. as t — )■ oo, since So < S2 — Si — S - 



i_ 

2+ei 



Now, for £3 > 0, consider the sequence {zr^^ (f)}, where R^^ > is the constant in ( A.7I 
Using (A.7l and (A.12I we may conclude that 



inf (z_R (t) 

t>o ^ ^ 



> 1 - £3- 



Since all processes involved are non-negative, it readily follows from (A. 15 1 that 

Pf lim {t+lf«zt =0) > 1-53. 



(A.15) 



(A.16) 



The lemma follows by taking £3 to zero in ( |A.16[ ). □ 

Proof. [Proof of Lemma |4.4| Let C denote the set of possible Laplacian matrices (nec- 
essarily finite) and T> the distribution on £ induced by the link formation process. Since the 
set of Laplacian matrices is finite, the set C may be chosen such that p — iniLecPL > 0, 
with PL — f{Lt — L) for each L £ C and '^l^jtPl = 1- The hypothesis X2{L) > implies 
that for every z £ C'^ , 



E 



•zF Lz > z"^ {plL)z ■ 



z^Lz> A2(L)|lzf . 



(A.17) 



Denoting by the cardinality oi C, it follows from (A. 17 1 that for each z£ C there exists 



some Lz G C, such that z LzZ > (A2(-£/)/j£j)||z|| . Moreover, since the set C is finite, the 
mapping : — > C may be realized as a measurable function. 

For each L £ C, the eigenvalues of the matrix Inm — PtL® Im are 1 and 1 — /3tA„(L), 2 < 
n < N , each being repeated Af times. Hence, for t > to (large enough), ||/]vm — /3ti'X>/M|| < 1 
and \\{Inm — PtL ® /i\/)z|| < ||z|| for every z G 'BJ^'''' . Hence, the functional tl.z. given by 



Tl.z, 



1 

1 - 



\{lNM~l3tL<SlM)A 



il t < to OT z 

otherwise 







is jointly measurable in L and z and satisfies < r^^z < 1 for each pair {L,z). Let {rt} be 
the {Tt+i} adapted process given by rt — r^f^zt for each t, and ||(/jva/ — PtL ® 7A<f)zt|| = 
(1 — rt)||zi|| a.s. for each t. We now need to verify that {rt} satisfies (4.11 for some Cr > 0. 
To this end, for t large enough. 



\{Inm - PtL^^ (g) lM)zt\[ 



■ zf{lMM 



/A/)zt + Ptz'^{L^^ ® /Af)^Zi 



<(1 
< (1 



2/3tLzt 

2l3t\2{L)/\C\) Ijztf + cift'||zt 
Pt\2{L)/\C\) \\zt\\\ 



where we have used the definition of the function Lz, the boundedness of the Laplacian 
matrix and the fact that /3t — > 0. Hence, by making to larger if necessary, we have 



'/Af)zt|| < ( 1 
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Pi 



\2{L) 

4|ri 



(A.19) 



for all t>to. Now, by ( |A.19 l 

E[\\{lNM-l3tL(g,lM)zt\\ I Tt] = ^PL(l-rL,zJ||zt 




Since X^l^^l^ PLrL.^t > 0, we have for t > to, 

{1 -E[rt\Tt])\\zt\\ =E[\\{Imm - PtL (g) lM)zt\\ \ J't] < (l^P/^t^Jj^) H^tH- 
Since, by definition rt = 1 on the set {zt =0}, it follows that 

for all t > to, thus establishing the assertion. □ 
Appendix B. Proofs in Section [5} 

Proof. [Proof of Lemma 5.2 We will show the desired convergence in the matrix Frobe- 
nius norm (denoted by || • ||f in the following). Since the matrix space under consideration 
is finite dimensional, the convergence in £2 norm will follow. The existence of quadratic mo- 
ments implies the convergence of the sample covariances (see (2.3|) to the true covariances 
and, hence, for each n, Q„{t) — >■ R„ a.s. Since, in addition, the sequence {"/t} in (2.2 I goes 
to zero, we may choose an a.s. finite random variable R2, such that for each n. 



^e' (sup \\h^ (Q„{t) + jthij ^ - 
\t>o II 



hJ<R2<oo]=1. (B.l) 



By construction, the matrix sequences {G„(t)} and {Qn{t)} are symmetric for each n. Let 
Gnit) — Gn(t) — Gavg(i) denote the deviation of the Grammian estimate at agent n from the 
instantaneous network average Gavg(i). Also, let Gt and Dt respectively denote the matrices 
[Gi{t),--- ,GN{t)f and [-Di(t),--- ,Djv(t)]^, where D^{t) = (Qn(t) + 7t/M„)"' for each n. 
Using the following readily verifiable properties of the Laplacian: 

(liV ® /m)^ (it ® /Af) = 0, (Lt®/M)(liV®Gavg(t)) =0, (B.2) 

we have 

Gt+i = (Inm - Pt {Lt ® Im) - olJnm) Gt + a* {{Dt - Oavg(t))) , (B.3) 
where DaNg{t) — jj "^^^i Dn{t) . Note that, by (B.l I, there exists an {J-t} adapted a.s. 
bounded process {Ut}, such that supt>o 11^* ~ ^avg(t)||F < Ut a.s. For m £ {1, ■ ■ • , M}, let 
Gm,t denote the m-th column of Gt. The process {Gm,t} is {J't} adapted and Gm,t G 
for each t. Then, by Lemma |4.4| there exists a [0, l]-valued {J-t+i} adapted process {r^.t}, 
such that, 

\\{Inm - l3tLt(g) lM)G,r,,t\\ < {^ - rm,t)\\Gm,t\\ 

and Ee* [rm,t| J^t] > Cm,r/{t + ly^ a.s. for t > to sufficiently large. Noting that the square of 
the Frobenius norm is the sum of the squared column £2 norms, we have 

M 

\\ilNM-l3tLt(»lM)Gt\\l < Y.{l~rm,tf\\G„,4^ < {l^rtfWGtWl, (B.4) 

m — 1 

where {rt} is the {Tt+i} adapted process given by rt = ri,i A r2,t A ■ ■ ■ A rM,t- By the 
conditional Jensen's inequality, we obtain 

Ee4rt\Tt] > A*LiEe.[r™,f|J^t] > Cr/{t + iy^ (B.5) 
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for some Cr > and t > to. Recall {at} from (2.51. Using (B.4l, we finally get 

\\{InM - PtLt ® Im - OltlNM)Gt\\F <\\{InM - PtLt ® lM)Gt\\F + Ctt\\Gt\\F 

<{l-rt)\\Gt\\F + ct\\Gt\\F 
<(l-n/2) \\Gt\\F 



(B.6) 



for t>to. From (B.3l and (B.6 1 wc then have 



l|Gt+i||F < UlNM-l3tLt<E)lM-atlNM)Gt\\F + atUt < {I - rt/2) \\Gt\\F + ^tUf (B.7) 



By (B.5I and since Pt/ at — > cxj as t — >■ oo, the recursion in (B.7 1 clearly falls under the 



purview of Lemma 4.3 and we conclude that ||Gt||F — >■ a.s. as t ^ oo. The convergence 



in the C2 norm follows immediately. □ 

Proof. [Proof of Lemma 5.3 The process {Gavg(i)} satisfies the following recursion: 

Gavg(t + 1) = (1 - Qt)Gavg(t) + atDavg(t). 

Let Gavg(t) denote the residual Gavg(t) — Sc and the process {Gavg(i)} satisfies 

Gavg(t + 1) = (1 - at) Gavg W + at (£'avg(t) - Ec) . (B.8) 
By Eqn. 139, Lemma 25 in [B] there exist to sufficiently large and a constant B such that 

t-i / t-i \ 

o^E ( n <B, 

k^s \ i = fe + l / 

for all positive integers t and s with to < s < t. Also, the convergence of the sample 

covariances and the fact that 7* — >■ as t 00 imply Davg(r) — > Ec a.s. as t — >■ 00. Hence, 

for a given e > 0, we may choose > to such that ||-Davg(t) ^ Ec|| < £ for all t > t^. 
From (B.8 1, we then have for t > t^ 



Gavg(f)|| < 



t-1 



|Gavg(t.)|| + £ ( ( n (1 

k = tc \ \! = fe + l 

Gavg(tE) + Be. 



a. 



akE 



(B.9) 



Since '^t>o ott = 00 the first term on the right hand side of (B.9 1 goes to zero as t — >■ 00, and 
we have limsupt_>;3Q ||Gavg(t)|| < Be. Since e > is arbitrary, we conclude that Ga,vg{t) — > 
a.s. as f 00 by taking e to zero. The desired assertion follows immediately. □ 

Proof. [Proof of Proposition [sTs] A version of this result was established in [33] (Lemma 



6) for the case of constant gains Kn{t). In the following we generalize the arguments of 33 
to time-varying adaptive gains. To this end we show 



inf z 

11^11=1 



— L ® Im 



at 



a:-h z > 



(B.IO) 



for all t sufficiently large, where IC — diag {Ki, ■ • ■ , Kn)- 

A vector z £ R^^^ may be decomposed as z = zc +Z(^±, with zc denoting its projection 
on the consensus or agreement subspace C, 



liv ® a for some a G 



(B.ll) 



and Z(;± the orthogonal complement. Also, denoting by Die the symmetricized version of ICH, 
i.e., Vic — | [JCH + H^JC^), standard matrix manipulations and properties of the Laplacian 
yield 



at 



at 



+ z^±VkZ(<± + 2zcT)k.Zc^ + zcViczc- (B.12) 
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By construction, X]n=i ~ ^ri=i-^nRn Hn ~ NIm, and, hence, we note that 

j,^ ,- 

c ' 



z^Viczc = ||zc||^ for each z G R^^^. Let us choose a constant ci > such that 



1^ and Zc'Dk:Z(,± > — ci ||zc|| \\Z(, 



It then follows from (B.12I that 

'A 



L(g)lM +fCH]z> ( ^A2(L)-ci ) ||zc_Lf -2ci||zc||||zci|| + ||zc 



Since Pt/ctt ^ oo and X2{L) > 0, there exists sufficiently large such that 

^A2(L) -ci > c?, Vt>ti. 

a* 



We now verify (B.lOl for t > ti. To this end, assume ||z|| = 1. In case zc = (Hz^j 
we have from (B.13I 



(B.13) 

(B.14) 

II = 1), 



Pi 
at 



L(SIm+ICH]z> —\2{L) - ci > 



For the other case, i.e., zc 7^ 0, 



at 



L0 Im +ICH] z> \\zc 



at 



-A2(L)-ci 
at 



l|zc| 



2ciJ 



l|zc| 



>0, 



where the last inequality follows from the fact that the quadratic functional of y^^y is 
always positive due to the discriminant condition imposed by ( |B.14[ ). We thus conclude that 

/^«T^ r_ _ , „ ^ n {B.15) 



at 



-L (g)lM +JCH]z>0 



for all t > ti and z, such that ||z|| = 1. Since the quadratic form in ( |B.15[ ) is a continuous 
function on the compact unit circle, we may further conclude that 



inf z'^ i (g) hi + ICH] z > C2 > 



(B.16) 



for some positive constant C2, thus verifying the assertion in (B.lOl for all t > ti. To 
complete the proof of Proposition 5.5 choose any < e < C2. It then follows from (B.I61 
and continuity that for t > ti and arbitrary z G K'^*^, 



z' {l3tL<S,lM + atICH)z>at\\z 



inf z' { ^L(S hi +K:U]z 



at 



> (c2 - e) at ||z|| 



thus verifying the assertion of Proposition 

Proof. [Proof of Proposition 
Ci > such that for arbitrary z G 



5.5| with eic — e, tic — ti and c^; = C2 — e. □ 



Proof. [Proof of Proposition |5.6| By (B.13 1 in Proposition 5.5 there exists a constant 



® hi + ICH] z > ( — ML) - ci ) ||zc_L f - 2ci ||zc|| ||zc_l || + ||z 
at I \at I 



Hence for K. satisfying (5.2 1, we have 



at 



L(g) hi + ICH ]z>z 



Pt^ 



L®hi+K.H z-e||z 



at 

— \2{L) - ci - e ) ||Zf;j 
at / 
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f -2ci||zc||||zc^|| + (l-e)||zc|P. 



Using the fact that < e < 1, we have 



'^L(SIm+ICH]z> 
at 



2at 



+ 



Cl 



— A2(L) -ci -e - - 
Zat 1 



VT^\\zc\ 



Since X2{L) > and Pt/at — >■ cxj as t — >■ oo, there exists (large enough), such that, 



— A2(-L -Cl - e- 

2at 1 — e 



> 



for all t>te- We may then conclude from (B.19I that 



Cet 



L (S hi + ICH ] z > 



A 

2at 



HL)\\zc 



and, hence 



(ft 



L ® Jm + atJCH z > 



A2(i) 



A he 



for all t > ti:, z € R'^*^ and if satisfying (5.2 1. This establishes the assertion. □ 
Proof. [Proof of Lemma 



5.7 



Let the residual x„(t) — x„{t) — Xavg(i). Then arguments 
along the lines of (B.2l-(B.3l show that the process xt ~ [xf (t),- • • ,x^(f)]"^ satisfies the 



recursion 



Xi+i = {Inm - l^tLt ® hi) Xt + QtZt, 
where the process {zt} is defined as 



Inm - -^Ijv ® (liv ® /a/)^ ) fCt (yt - Hxt) . 



Since ICt IC as t oo, the process {xt} is bounded (Lemma 5.4 1, and the observation 
noise (t satisfies (A. 5), there exist two R+ valued processes: 1) an J-t-adapted {Ut} satisfying 
supoQ ||C^t|| < oo a.s.; and (2) an i.i.d. {Tt+i} adapted {Jt} independent of J^t for each t 
and Efl. [|| Jtf "^^'] < oo, such that 

||2t|l < Ut{l + Jt). 



Since Xt G C for all t, by Lemma 4.4 there exists an {J-t+i} adapted R+ valued process 
{rt} with < rt < 1 a.s. such that 

UInm - l3tLt (g) hi - VNM)Sit\\ < (l-n)||xt|| 

for all t (large enough) and a constant Cr > such that for all t 

Ee- [n i Ji] > ^"^y^ a.s. 

From the above development we conclude that 

Pt+ill < {l-rt)\\Sit\\+atUt{l + Jt) (B.20) 

for all t (large enough). The recursion ( |B.20[ ) clearly falls under the purview of Lemma |4.3[ and 
we have the assertion 

Pfl. ( lim {t + lySit = O) = 1 

for all To G |^0, ri — T2 — j • This establishes the claim. □ 
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Proof of Lemma [ 

We will use the following approximation result from [42] in the proof. 
Proposition B.l (Lemma 4.3 in 43). Let {bt} be a scalar sequence satisfying 



ht+i < 1^1 

where c > r, r > 0, and the sequence {dt} is summable. Then Vimswp^^^it + l)^6t < oo. 

The following generalized convergence criterion of dependent stochastic sequences will 
also be useful. 

Proposition B.2 (Lemma 10 in 



43 



Let { Jt} be an R valued {Tt+i} adapted process 
ch that E[Jt| J-t] = a.s. for each t > 1. Then the sum X]t>o exists and is finite a.s. 

2 

///' set where X^oq EfJt | J-t] is finite. 



Proof. [Proof of Lemma 5.9 For each 5 > recall the construction in ( 5.8 1-( 5.101. 
Clearly, it suffices by the arguments in Lemma [5.8| to establish the required convergence rate 
claim for each of the processes {z^ }. 
Let r e [0, 1/2) be such that 



Pfl* ( lim (t+ lY llz 
\t^x II 







for all n. Such a r always exists by Lemma |5.8| We now show that there exists t such 
that r < r < 1/2 for which the claim holds. To this end, choose r G (r, 1/2) and let 
/i — l/2(f + r). For each 5 > recall the construction in ( |5.8[ )-( [5T0l ) and the J^t-adapted 
process {zf} satisfies 



+ Oft p/J +at Jj +2at(z' 



+2at ||c/f II (||7a/ - atPtl ||zt j| +«* ll^^tl 



atPt j; 



Since t\ > T2 -\- 1/(2 + ei) + 1/2, by Lemma 5.7 and (5.9l the process {Ut} may be chosen 
such thal[3 

'^Ut^^o{{t+l)-'-'^y (B.22) 
Since ||zj || = o ((t + 1)~^) (by hypothesis), we obtain 



2at\\W 



= o((t+l)-^/^-") 



The existence of the second moment of the observation noise process and the boundedness 
of {K.i} imply 



lim [t + 1) 



-1/2-E 



Jt 



for each e > and, hence 



2a? (7° J* =o (f+1 







-3/2 



1 



(B.23) 



Since 2^ = t + t and r < 1/2, by (B.23 1 we note that 



< oo. 



Similarly we have 

Y^it + lf^al hj!\ < oo, ^(f + l)2"a? \\utf < 00. 



^For IR+ valued sequences {ft} and {gt} the notation ft = o{gt) means that ft/gt — >■ as t — ^ oo. 
For stochastic sequences the o(-) is to be interpreted a.s. or pathwise. 
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Now consider the terms || Since the second moment of the observation noise process 

exists, {fCf} is uniformly bounded and 2/i < 1, it can be shown that 

^(f+lf^Q?||jff <0O. 



Now let {Wj*} denote the Tt+i sequence given by 



at 



We note that E^. [W/j J"t] = for all t and (at least for t large) we have E^. [(W/)^ \Tt] < 
Ot ||zt 11'^ II Jt^jl'^. Since the second moment of the observation noise process exists and {JCi} 



is uniformly bounded, we obtain 



Efl. 



Hence 



x4m 



o((t + l)- 



2-2T+4fi 



oUt + l) 



-2+2t 



(B.24) 



Since 2r < 1, the sequence on the left hand side of (B.24 1 is summable and by Proposition B.2 



we conclude that 'Y^-f^^it+^Y'^Wt exists and is finite. Since — !> Im uniformly and at — >■ 
as t — >■ oo, we have 

\Im - Qtrf IP < (1 - ait + 1)-') (B.25) 
for all t large enough. Thus (eventually) we have from ( B.21[ ) 

llzt+iir < (1 - a{t + ly^) ||z?||' + dt{t + ly^^ 



where the term dt{t+ l)^^** corresponds to all the residuals. Moreover by (B.22 |-(B.25 1 the 
limit limt_>oo X^l^o exists and is finite. Since a > 1 > 2/i, an immediate application of 
Proposition |B.1| yields 



limsup(t + 1 



i2m 



< OO a.s. 



Hence, there exists r with r < r < fi, such that (t + 1)"^ ||zt || — >■ a.s. as i — >■ oo. Since the 
above holds for all 5 > 0, we conclude that {t + iy ||zt|| — > a.s. as t — >■ oo. Thus, for every r 
for which the convergence in ( |5.16[ ) holds there exists t £ (f , 1 /2) for which the convergence 
continues to hold. Hence, by induction we conclude that the required convergence holds for 
aU T G [0,1/2). □ 
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